Skip to content

Conversation

binmahone
Copy link
Collaborator

this PR closes #12557, pls check out the descriptions there

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone
Copy link
Collaborator Author

build

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone
Copy link
Collaborator Author

build

winningsix
winningsix previously approved these changes Apr 22, 2025
@abellina
Copy link
Collaborator

abellina commented Apr 22, 2025

I believe we should mark the config internal as it doesn't seem like it is a user setting. You'd have to know that the partial aggregate has two stages, and that the first stage does some aggregation and that there are more stages that further combine these partial aggregates into bigger partial aggregates.

Side comment, this log at INFO level seems a little odd. Shouldn't this be at DEBUG level? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L995 and the INFO level one should be the case where we do skip? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L986

@sameerz sameerz added the performance A performance related task/issue label Apr 22, 2025
@binmahone
Copy link
Collaborator Author

I believe we should mark the config internal as it doesn't seem like it is a user setting. You'd have to know that the partial aggregate has two stages, and that the first stage does some aggregation and that there are more stages that further combine these partial aggregates into bigger partial aggregates.

will do

Side comment, this log at INFO level seems a little odd. Shouldn't this be at DEBUG level? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L995 and the INFO level one should be the case where we do skip? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L986

These two logs are both helpful when troubleshooting the skip agg case. How about I changing both to INFO level? I don't know what role DEBUG level logging are supposed to play for us. To me, it would be nice if a log in user production is informative enough for most troubleshootings. Asking user to run it again with DEBUG logging settings would be impractical sometimes because it might be very costly.

@binmahone
Copy link
Collaborator Author

I believe we should mark the config internal as it doesn't seem like it is a user setting. You'd have to know that the partial aggregate has two stages, and that the first stage does some aggregation and that there are more stages that further combine these partial aggregates into bigger partial aggregates.

will do

Side comment, this log at INFO level seems a little odd. Shouldn't this be at DEBUG level? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L995 and the INFO level one should be the case where we do skip? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L986

These two logs are both helpful when troubleshooting the skip agg case. How about I changing both to INFO level? I don't know what role DEBUG level logging are supposed to play for us. To me, it would be nice if a log in user production is informative enough for most troubleshootings (unless it's very verbose). Asking user to run it again with DEBUG logging settings would be impractical sometimes because it might be very costly.

@abellina
Copy link
Collaborator

I believe we should mark the config internal as it doesn't seem like it is a user setting. You'd have to know that the partial aggregate has two stages, and that the first stage does some aggregation and that there are more stages that further combine these partial aggregates into bigger partial aggregates.

will do

Side comment, this log at INFO level seems a little odd. Shouldn't this be at DEBUG level? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L995 and the INFO level one should be the case where we do skip? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L986

These two logs are both helpful when troubleshooting the skip agg case. How about I changing both to INFO level? I don't know what role DEBUG level logging are supposed to play for us. To me, it would be nice if a log in user production is informative enough for most troubleshootings (unless it's very verbose). Asking user to run it again with DEBUG logging settings would be impractical sometimes because it might be very costly.

That's fine with me. It's once per task for the first batch only, so it should be fine for now to make them both INFO

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone
Copy link
Collaborator Author

build

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone
Copy link
Collaborator Author

build

@binmahone
Copy link
Collaborator Author

build

@binmahone binmahone merged commit 6ce344b into NVIDIA:branch-25.06 May 6, 2025
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance A performance related task/issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] by default enable skip agg when the inputs of agg cannot reduce

5 participants